A Fast and Accurate Method for Approximate String Search
نویسندگان
چکیده
This paper proposes a new method for approximate string search, specifically candidate generation in spelling error correction, which is a task as follows. Given a misspelled word, the system finds words in a dictionary, which are most “similar” to the misspelled word. The paper proposes a probabilistic approach to the task, which is both accurate and efficient. The approach includes the use of a log linear model, a method for training the model, and an algorithm for finding the top k candidates. The log linear model is defined as a conditional probability distribution of a corrected word and a rule set for the correction conditioned on the misspelled word. The learning method employs the criterion in candidate generation as loss function. The retrieval algorithm is efficient and is guaranteed to find the optimal k candidates. Experimental results on large scale data show that the proposed approach improves upon existing methods in terms of accuracy in different settings.
منابع مشابه
A Fast and Accurate Global Maximum Power Point Tracking Method for Solar Strings under Partial Shading Conditions
This paper presents a model-based approach for the global maximum power point (GMPP) tracking of solar strings under partial shading conditions. In the proposed method, the GMPP voltage is estimated without any need to solve numerically the implicit and nonlinear equations of the photovoltaic (PV) string model. In contrast to the existing methods in which first the locations of all the local pe...
متن کاملA Fast and Accurate Expansion-Iterative Method for Solving Second Kind Volterra Integral Equations
This article proposes a fast and accurate expansion-iterative method for solving second kind linear Volterra integral equations. The method is based on a special representation of vector forms of triangular functions (TFs) and their operational matrix of integration. By using this approach, solving the integral equation reduces to solve a recurrence relation. The approximate solution of integra...
متن کاملFast Approximate String Matching with Suffix Arrays and A* Parsing
We present a novel exact solution to the approximate string matching problem in the context of translation memories, where a text segment has to be matched against a large corpus, while allowing for errors. We use suffix arrays to detect exact n-gram matches, A* search heuristics to discard matches and A* parsing to validate candidate segments. The method outperforms the canonical baseline by a...
متن کاملA Consistent and Accurate Numerical Method for Approximate Numerical Solution of Two Point Boundary Value Problems
In this article we have proposed an accurate finite difference method for approximate numerical solution of second order boundary value problem with Dirichlet boundary conditions. There are numerous numerical methods for solving these boundary value problems. Some these methods are more efficient and accurate than others with some advantages and disadvantages. The results in experiment on model...
متن کاملFAMOUS: Fast Approximate string Matching using OptimUm search Schemes
Finding approximate occurrences of a pattern in a text using a full-text index is a central problem in bioinformatics and has been extensively researched. The introduction of practical bidirectional indices has opened new possibilities for solving the problem as they allow the search to be started from anywhere within the pattern and extended in both directions. In particular, use of search sch...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011